Skip to content
This repository has been archived by the owner on Jan 14, 2025. It is now read-only.

Add by_band kwarg to batch, allow each lightcurve band to be processed independently #327

Merged
merged 15 commits into from
Dec 21, 2023

Conversation

dougbrn
Copy link
Collaborator

@dougbrn dougbrn commented Dec 14, 2023

Resolves #317
Resolves #241

A bit of scope creep on this PR, the main purpose is to address #317, giving more support to users looking to apply a function to each band within a lightcurve within the batch interface. This is done by adding the by_band kwarg to Ensemble.batch, which makes sure that the band column is part of the groupby and also modifies the output to return an output column for each band. As a result, this also standardizes the output of batch to always return an EnsembleFrame object (even when this keyword is not used), where as previously in some cases it would return a series.

I decided to go ahead with addressing #241 as part of this as well, removing the compute kwarg within Ensemble.batch, I opted not to replace it with persist, as I believe that batch should just return a lazy object that users can chain into a compute or persist command as they see fit.

I also went ahead and added a batch showcase tutorial as part of this PR, in that I tried to outline how this change actually is used, and also tried to be transparent about some issues regarding dataframe outputs.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@dougbrn dougbrn changed the title Batch per band attempt2 Add by_band kwarg to batch, allow each lightcurve band to be processed independently Dec 14, 2023
@dougbrn dougbrn force-pushed the batch_per_band_attempt2 branch from 9ba8df0 to 0efba3d Compare December 15, 2023 17:26
Copy link

codecov bot commented Dec 15, 2023

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (6b2a6a5) 94.56% compared to head (5215892) 94.61%.
Report is 1 commits behind head on main.

Files Patch % Lines
src/tape/ensemble.py 97.22% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #327      +/-   ##
==========================================
+ Coverage   94.56%   94.61%   +0.04%     
==========================================
  Files          24       24              
  Lines        1510     1540      +30     
==========================================
+ Hits         1428     1457      +29     
- Misses         82       83       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dougbrn dougbrn marked this pull request as ready for review December 19, 2023 18:51
@dougbrn dougbrn requested review from hombit and wilsonbb December 19, 2023 19:27
Copy link
Contributor

@hombit hombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

I have few clarification questions and a request to make batch() more readable with helper functions

src/tape/ensemble.py Show resolved Hide resolved
src/tape/ensemble.py Outdated Show resolved Hide resolved
src/tape/ensemble.py Outdated Show resolved Hide resolved
src/tape/ensemble.py Outdated Show resolved Hide resolved
src/tape/ensemble.py Outdated Show resolved Hide resolved
src/tape/ensemble.py Outdated Show resolved Hide resolved
src/tape/ensemble.py Show resolved Hide resolved
src/tape/ensemble.py Outdated Show resolved Hide resolved
src/tape/ensemble.py Outdated Show resolved Hide resolved
docs/tutorials/batch_showcase.ipynb Show resolved Hide resolved
@dougbrn dougbrn requested review from hombit and wilsonbb December 20, 2023 00:05
@@ -1032,20 +1032,26 @@ def batch(self, func, *args, meta=None, use_map=True, compute=True, on=None, lab
the results. Overridden by TAPE for TAPE and `light-curve`
functions. If none, attempts to coerce the result to a
pandas.Series.
by_band: `boolean`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add here a sentence about output column names?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the sentence that tried to explain how the columns affected, let me know if that adds any clarity

@dougbrn dougbrn requested a review from hombit December 20, 2023 20:15
Copy link
Collaborator

@wilsonbb wilsonbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for doing this, Doug!

Copy link
Contributor

@hombit hombit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look great, thank you so much!

@dougbrn dougbrn merged commit 97a2bfc into main Dec 21, 2023
10 checks passed
@dougbrn dougbrn deleted the batch_per_band_attempt2 branch January 2, 2024 23:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Per-Band Support in Ensemble.batch Ensemble.batch should not compute by default
3 participants